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- The MAILING DATE of this communication appears on the cover sheet with the correspondence address- 
All claims being allowable, PROSECUTION ON THE MERITS IS (OR REMAINS) CLOSED in this application. If not included 
herewith (or previously mailed), a Notice of Allowance (PTOL-85) or other appropriate communication will be mailed in due course. THIS 
NOTICE OF ALLOWABILITY IS NOT A GRANT OF PATENT RIGHTS. This application is subject to withdrawal from issue at the initiative 
of the Office or upon petition by the applicant. See 37 CFR 1.313 and MPEP 1308. 

1 • [3 This communication is responsive to 4/19/2007 . 

2. ^ The allowed claim(s) is/are 2-5. 7.8.10 and 11 . 

3. S Acknowledgment is made of a claim for foreign priority under 35 U.S.C. § 1 19(a)-(d) or (f). 
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1. S Certified copies of the priority documents have been received. 

2. □ Certified copies of the priority documents have been received in Application No. . 

3. □ Copies of the certified copies of the priority documents have been received in this national stage application from the 

International Bureau (PCT Rule 17.2(a)). 
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5. □ CORRECTED DRAWINGS ( as "replacement sheets") must be submitted. 

(a) □ including changes required by the Notice of Draftsperson's Patent Drawing Review ( PTO-948) attached 

1) □ hereto or 2) □ to Paper No./Mail Date . 
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Paper No./Mail Date . 
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EXAMINER'S AMENDMENT 

1 . An examiner's amendment to the record appears below. Should the changes 
and/or additions be unacceptable to applicant, an amendment may be filed as provided 
by 37 CFR 1 .312. To ensure consideration of such an amendment, it MUST be 
submitted no later than the payment of the issue fee. 

2. Authorization for this examiner's amendment was given in a telephone interview 
with Bogdan Zinchenko on June 8, 2007. 

3. Please amend claims 2, 7, 8, 10 and 1 1 as shown in the attached amendment 
faxed in by the applicant on June 8 th , 2007. 

4. The Examiner's Amendment has been made in order to clarify minor informalities 
and overcome possible 35 U.S.C 112 second paragraph rejection, and therefore place 
this application in condition for allowance. 



Allowable Subject Matter 

5. Claims 2-5, 7, 8, 10 and 11 are allowed . 

6. The following is an examiner's statement of reasons for allowance: 

As to claims 2, 7, 8, 10 and 11 . the prior art fails to teach a document extracting 
apparatus and method comprising the steps of: acquiring a plurality of documents from 
an information source, according to a user-specific criteria, computing all degrees of 
similarity between the plurality of documents, and express the degrees of similarity in a 
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symmetric matrix, computing all combinations of the degrees computed between 
plurality of documents, computing a sum of degrees of similarity between all of the 
documents and extracting documents constituting the combination with the smallest 
sum of the degrees of similarity among the plurality of documents constituting the 
respective combinations, wherein the similarity between the documents is computed in 
the manner as disclosed in claims 2, 7, 8, 10 and 11. 

As to claims 3-5 , those claims are allowed by the virtue of their dependency 
upon allowed claims. 

7. Any comments considered necessary by applicant must be submitted no later 
than the payment of the issue fee and, to avoid processing delays, should preferably 
accompany the issue fee. Such submissions should be clearly labeled "Comments on 
Statement of Reasons for Allowance." 

The Prior Art 

8. The prior art made of record and not relied upon is considered pertinent to 
applicant's disclosure. 

Gomes et al (US Patent 6615209) disclose a method for comparing two 
documents for the similarity. 

Seki et al (US Publication 20020143737) disclose an information 
retrieval device capable of comparing two documents and detecting 
repetitions. 
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Inquiry 



9. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Angela M. Lie whose telephone number is 571-272- 
8445. The examiner can normally be reached on M-F. 

1 0. If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Don Wong can be reached on 571-272-1834. The fax phone number for the 
organization where this application or proceeding is assigned is 571-273-8300. 

1 1 . Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 




Angela M Lie 




DON WONG 
SU ™^WTENT EXAMINER 
TECHNOLOGY CENTER 21 00 
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Application Serial No. 10/731,164 
Proposed Examiner's Amendment 

1. (Canceled) 

2. (Currently Amended) A document extracting apparatus, comprising: 

a document acquiring device to acquire a plurality of documents from an 
information source, according to a user-specific criteria, to be candidates for extraction; 

a similarity computing device to compute all degrees of similarity between the 
plurality of documents, and express the degrees of similarity in a symmetric matrix, 
the similarity computing device comprising: 

a character-string-dividing functional unit to divide each of the plurality of 
documents into predetermined character strings; 

a character-string frequency computing functional unit to compute 
document vectors of the plurality of documents on the basis of a frequency of appearance of the 
predetermined character strings divided by the character-string-dividing functional unit; and 

a mutual similarity computing functional unit to compute the degrees of 
similarity between the plurality of documents on the basis of the document vectors obtained from 
the character-string frequency computing functional unit; 

a combination computing device to compute all combinations of any numb e r of 
docum e nts from the degrees of similarity computed between t he plurality of documents; 

a sum of degrees of similarity computing device to compute, with respect to all of 
the combinations, a sum of the degrees of similarity between all of the documents that constitute 
each combination, based on all of the degrees of similarity expressed in the symmetric matrix; 
and 
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a document extracting device to extract documents constituting the combination 
with the smallest sum of the degrees of similarity among the plurality of documents constituting 
the respective combinations. 

3. (Previously Presented) The document extracting apparatus according to Claim 2, 
the character-string-dividing functional unit dividing each of the plurality of 

documents into predetermined character strings using any of the following character string 
division methods: a morphological analysis method, an n-gram method, and a stop-word method. 

4. (Previously Presented) The document extracting apparatus according to Claim 2, 
the character-string frequency computing functional unit generating document 

vectors obtained by weighting each of the plurality of documents by a term frequency and inverse 
document frequency (TFIDF) weighting method on the basis of a frequency of appearance of the 
divided character strings. 

5. (Previously Presented) The document extracting apparatus according to Claim 2, 
the mutual similarity computing functional unit computing degrees of similarity 

between the plurality of documents by a vector space method on the basis of the document 
vectors of the plurality of documents. 

6. (Canceled) 

7. (Currently Amended) A computer-readable media having a document extracting 
program allowing a computer to serve as: 

a document acquiring device to acquire a plurality of documents from an 
information source, according to a user-specific criteria, to be candidates for extraction; 

a similarity computing device to compute all degrees of similarity between the 
plurality of documents, and express the degrees of similarity in a symmetric matrix, 

-2- 
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the similarity computing device comprising: 

a character-string-dividing ftinction to divide each of the plurality of 
documents into predetermined character strings; 

a character-string frequency computing function to compute document 
vectors of the plurality of documents on the basis of a frequency of appearance of the 
predetermined character strings divided by the character-string-dividing function; and 

a mutual similarity computing function to compute the degrees of 
similarity between the plurality of documents on the basis of the document vectors obtained by 
the character-string frequency computing function; 

a combination computing device to compute all combinations of any number of 
docum e n te- from - t he degrees of similarity computed between t he plurality of documents: 

a sum of degrees of similarity computing device to compute, with respect to all of 
the combinations, a sum of the degrees of similarity between all of the documents that constitute 
each combination, based on all of the degrees of similarity expressed in the symmetric matrix; 
and 

a document extracting device to extract documents constituting the combination 
with the smallest sum of the degrees of similarity among the plurality of documents constituting 
the respective combinations. 

8. (Currently Amended) A computer-readable media having a document extracting 
program allowing a computer to serve as: 

a document acquiring device to acquire a plurality of documents from an 
information source, according to a user-specific criteria, to be candidates for extraction; 
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a similarity computing device to compute all degrees of similarity between the 
plurality of documents, and express the degrees of similarity in a symmetric matrix, 
the similarity computing device comprising: 

a character-string-dividing function to divide each of the plurality of 
documents into character strings using any one of character string division methods; 

a character-string frequency computing function to generate document 
vectors obtained by weighting each of the documents by a term frequency and inverse document 
frequency (TFIDF) weighting method on the basis of a frequency of appearance of the divided 
character strings; and 

a mutual similarity computing function to compute the degrees of 
similarity between the plurality of documents by a vector space method on the basis of the 
document vectors of the plurality of documents, 

a combination computing device to compute all combinations of any numb e r of 
documents from the degrees of similarity computed between the plurality of documents; 

a sum of degrees of similarity computing device to compute, with respect to all of 
the combinations, a sum of the degrees of similarity between all of the documents that constitute 
each combination, based on all of the degrees of similarity expressed in the symmetric matrix; 
and 

a document extracting device to extract documents constituting the combination 
with the smallest sum of the degrees of similarity among the plurality of documents constituting 
the respective combinations. 
9. (Canceled) 
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10. (Currently Amended) A document extracting method, comprising: 

acquiring a plurality of documents from an information source, according to a 
user-specific criteria, to be candidates for extraction; 

dividing ftach of the documents into predetermined character strings, computing a 

frequency of appearance of th e divided character strings, computing document vectors of the 
plurality of documents on the basis of the frequency of appearance of the predetermined character 
strings, and computing the degrees of similarity between the plurality of documents using the 
document vectors: 

computing all degr ee s of similarity between the plurality of docum e nt s , and 
expressing the degrees of similarity in a symmetric matrix; 

computing all combinations of anv number of documents from - the degrees of 
similarity computed between t he plurality of documents; 

computing, with respect to all of the combinations, a sum of the degrees of 
similarity between all of the documents that constitute each combination, based on all of the 
degrees of similarity expressed in the symmetric matrix; and 

extracting documents constituting the combination with the smallest sum of the 
degrees of similarity among the plurality of documents constituting the respective combinations, 
oombinationo; 

dividing oaoh of tho doouments into - pr e d e t e rmin e d charact e r strings, computing a 

fr e qu e ncy of app e aranc e of the divided character strings oomputing dooumont vootors of the 
plurality of docum e nts on th e basis of th e fr e qu e ncy of appooranoo of tho prodotorminod character 
Btringa, a n d th e n oomputing tho degrees of similarity b e tw e en the plurality o £ <iooumonto uetftg 
tho document vectoro. 
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1 1 . (Currently Amended) A document extracting method, comprising: 

acquiring a plurality of documents Scorn an information source, according to a 
user-specific criteria, to be candidates for extraction; 

dividing each of the plurality of documents into predetermined character strings 

using any one of character string division methods, including a morphological analysis method 
an n-gram method, and a stop-word method, computing document vectors of the plurality of 
documents bv weighting each of the documents by a term frequency and inverse document 
frequency (TFIDF) weighting method on the basis of a frequency of appearance of the divided 
predetermined character strings, and computing the degrees of si milarity between the plurality of 
documents using a vector space method on the basis of the document vectors; 

computing all degrees of similarity between the plurality of documents, and 
expressing the degrees of similarity in a symmetric matrix; 

computing all combinations of any number of docum e nts from the degrees of 
similarity computed between the plurality of documents; 

computing, with respect to all of the combinations, a sum of the degrees of 
similarity between all of the documents that constitute each combination, based on all of the 
degrees of similarity expressed in the symmetric matrix; and 

extracting documents constituting the combination with the smallest sum of the 
degrees of similarity among the plurality of documents constituting the respective combinations- 
combination s ; 

dividing oaoh of tho plurality of docum e nts into predetermin e d charact e r strings 

ming any ono of charact e r string division methods, including a morphological analysis method, 
an n gram m e thod, and a otop word mothod, computing docum e nt v e otoro of tho plurality of 
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4 aeumcnts by w e ighting e ach of th e docum e nts by a term fr e quency and invoroo document 
fr e qu e ncy (TFIDF) w e ighting mothod on tho baoio of a frequency of app e aranc e of th e divid e d 
p re d e t e rmin e d charact e r s trings, and computing th e dogrooo of oimilarity betw ee n th e plurality of 
dooum e ntsm s i ft g - a v e ctor spac e m e thod on th e basis of th e dooumont vootoro. 
12-14. (Canceled) 
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